Olli Virmajoki Pairwise Nearest Neighbor Method Revisited

نویسندگان

  • Martti Juhola
  • Olli Virmajoki
چکیده

The pairwise nearest neighbor (PNN) method, also known as Ward's method belongs to the class of agglomerative clustering methods. The PNN method generates hierarchical clustering using a sequence of merge operations until the desired number of clusters is obtained. This method selects the cluster pair to be merged so that it increases the given objective function value least. The main drawback of the PNN method is its slowness because the time complexity of the fastest known exact implementation of the PNN method is lower bounded by (N), where N is the number of data objects. We consider several speed-up methods for the PNN method in the first publication. These methods maintain the precision of the method. Another method for speeding-up the PNN method is investigated in the second publication, where we utilize a k-neighborhood graph for reducing distance calculations and operations. A remarkable speed-up is achieved at the cost of slight increase in distortion. The PNN method can also be adapted for multilevel thresholding, which can be seen as a 1-dimensional special case of the clustering problem. In the third publication, we show how this can be implemented efficiently using only O(N logN) time, in comparison to a straightforward approach that requires O(N). The merge philosophy is extended, by using the iterative shrinking method, in the fourth publication. In the merge phase of the PNN method, the two nearest clusters are always joined. Instead of this, we assign data objects to the neighboring clusters that they belong to. In this way, we get better clustering results; however, the results come at the cost of an increase in the running time. The proposed method is also used as a crossover method in a genetic algorithm, which produces the best clustering results in respect of the minimization of intra cluster variance. The PNN algorithm can also be applied to generating optimal clustering. In the fifth publication, we use a branch-and-bound technique for finding the best possible

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast pairwise nearest neighbor based algorithm for multilevel thresholding

We propose a fast pairwise nearest neighbor (PNN)based O(N log N) time algorithm for multilevel nonparametric thresholding, where N denotes the size of the image histogram. The proposed PNN-based multilevel thresholding algorithm is considerably faster than optimal thresholding. On a set of 8 to 16 bits-per-pixel real images, experimental results also reveal that the proposed method provides be...

متن کامل

Iterative shrinking method for generating clustering

The pairwise nearest neighbor method (PNN) generates the clustering of a given data set by a sequence of merge steps. In this paper, we propose an alternative solution for the mergebased approach by introducing an iterative shrinking method. The new method removes the clusters iteratively one by one until the desired number of clusters is reached. Instead of merging two nearby clusters, we remo...

متن کامل

Practical methods for speeding-up the pairwise nearest neighbor method

Timo Kaukoranta University of Turku Turku Center for Computer Science Department of Computer Science Lemminkäisenkatu 14A FIN-20520 Turku, Finland Abstract. The pairwise nearest neighbor (PNN) method is a simple and well-known method for codebook generation in vector quantization. In its exact form, it provides a good-quality codebook but at the cost of high run time. A fast exact algorithm was...

متن کامل

Fast PNN-based Clustering Using K-nearest Neighbor Graph

Search for nearest neighbor is the main source of computation in most clustering algorithms. We propose the use of nearest neighbor graph for reducing the number of candidates. The number of distance calculations per search can be reduced from O(N) to O(k) where N is the number of clusters, and k is the number of neighbors in the graph. We apply the proposed scheme within agglomerative clusteri...

متن کامل

Fast and space efficient PNN algorithm with delayed distance calculations

Clustering of a data set can be done by the well-known Pairwise Nearest Neighbor (PNN) algorithm. The algorithm is conceptionally very simple and gives high quality solutions. A drawback of the method is the relatively large running time of the original (exact) implementation. Recently, an efficient version of the exact PNN algorithm has been introduced in literature. In this paper we give a fa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004